Journal of Biomedical Informatics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
ObjectiveNamed Entity Recognition (NER) and Biomedical Entity Linking (BEL) are essential for transforming unstructured Electronic Health Records (EHRs) into structured information. However, tools for these tasks are limited in non-English biomedical texts such as Dutch and Italian. This study investigates the use of prompt-based learning with Large Language Models (LLMs) to perform multilingual NER and BEL using minimal domainspecific data, while addressing annotation preservation during corpus...
Show abstract
ObjectiveMuch medical data is only available in unstructured electronic health records (EHR). These data can be obtained through manual (human) extraction or programmatic natural language processing (NLP) methods. We estimate that NLP only becomes economically competitive with manual extraction when there are ~6500 EHRs records. We have found that there is interest from clinicians and researchers in using NLP on projects with fewer records. We examine whether a large language model (LLM) can be ...
Show abstract
MotivationMedical documents are a crucial resource for medical research around the world. While troves of valuable health data exist, they are largely computationally inaccessible as hard copies of unstructured text. Moreover, the persistent prevalence of fax machines in medical settings contributes to further degradation of document quality. Digitization of these resources through manual data extraction is time-consuming and resource intensive. However, large language models (LLMs) have recentl...
Show abstract
ObjectiveAdverse events (AEs) resulting from medical interventions are significant contributors to patient morbidity, mortality, and healthcare costs. Prediction of these events using electronic health records (EHRs) can facilitate timely clinical interventions. However, effective prediction remains challenging due to severe class imbalance, missing labels, and the complexity of EHR records. Classical machine learning approaches frequently underperform due to insufficient representation of minor...
Show abstract
Electronic health records (EHRs) provide a large source of data that can be used for research purposes. Extraction of information from unstructured clinical notes in EHRs can be automated by large language models (LLMs). Although LLMs are promising for this task, challenges remain in reliable application of LLMs to EHR, including the lack of development and validation for languages other than English. Here, we identified Dutch LLMs and compared their performance in a case study. We selected the ...
Show abstract
Medical errors are one of the leading causes of death in the United States. Several public databases have been built to record patient safety events across healthcare systems to better understand and improve safety hazards. These reports typically include both structured fields (e.g., event type, device, manufacturer) and unstructured data elements (free text narrative of what happened). The structured fields are usually restricted to a limited number of categories, whereas the unstructured fiel...
Show abstract
Cross-jurisdictional pharmaceutical compliance requires comparative analysis of regulatory requirements across jurisdictions such as the US FDA and Chinas NMPA. Although large language models (LLMs) are increasingly explored for healthcare-related applications, their performance in cross-jurisdictional regulatory comparison has not been systematically characterized using dedicated benchmarks. This study introduces Sino-US-DrugQA, a bilingual benchmark dataset designed to evaluate LLM performance...
Show abstract
PurposeNatural Language Processing (NLP) has the potential to extract structured clinical knowledge from unstructured Electronic Health Records (EHRs). However, the limited availability of annotated datasets for algorithm training restricts its application in clinical practice. This study investigates the use of transformer-based NLP models to structure Italian EHRs in cardiac settings, addressing this gap. MethodsWe implemented and evaluated three named entity recognition algorithms: SpaCy, Fl...
Show abstract
Medication product names in Swiss electronic health records are heterogeneous and often encode multiple attributes (e.g., ingredient, strength, dose form, packaging) in German free text. This limits interoperability and reduces the utility of ATC codes, which do not uniquely identify products. We compared two workflows for mapping Swiss medication products to RxNorm and RxNorm Extension: (i) an Observational Health Data Sciences and Informatics (OHDSI) USAGI workflow with lexical similarity and ...
Show abstract
BackgroundIntegrating advanced artificial intelligence (AI) into clinical decision-support often requires the sharing of sensitive patient data with external services, raising privacy concerns. Homomorphic encryption (HE) allows computing directly on encrypted data, without revealing the underlying patient information. ObjectivesTo develop a large language model (LLM)-assisted diagnosis framework while preserving patient privacy in the clinical text analysis, by leveraging HE and using rare dis...
Show abstract
Generative artificial intelligence (GenAI) applications have been at the forefront of clinical documentation assistants, aiming to reduce physician notetaking burden. However, GenAI systems are resource-intensive, and deployment in low-resource healthcare settings can be challenging and cost prohibitive. We present a symbolic reasoning model (SRM) for detecting chief complaints from clinical conversations and evaluate it against two large language models (LLMs), Gemma2-9b and Llama3.3-70B-Versat...
Show abstract
The critical need for accessible patient data in clinical research is often hindered by privacy regulations and data scarcity. While synthetic data generation offers a promising solution, existing generative models face key limitations. GANs can suffer from training instability, while diffusion models typically process records independently and often neglect the local neighborhood structure of the data manifold. To address this gap, we introduce TabGraphSyn, a two-stage generative framework for ...
Show abstract
ObjectiveSystematic clinical phenotyping using Human Phenotype Ontology (HPO) is central to rare disease diagnosis. However, current disease prioritization (ranking candidate diseases from HPO for a patient) methods face key challenges: they often fail to account for the hierarchical structure of HPO terms, ignore dependencies among correlated terms, and do not adjust for batch effects arising from systematic differences in phenotype documentation across cohorts, institutions, or clinicians. We ...
Show abstract
IMPORTANCEAlthough angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) are recommended for people with chronic kidney disease (CKD), they remain underused. Barriers to adherence, such as adverse effects or patient refusal, are frequently embedded within unstructured clinical narratives and are therefore inaccessible to structured data analytics. Scalable natural language processing (NLP) approaches are needed to identify these barriers and support guideline-...
Show abstract
Clinical decision making often relies on expert judgment guided by established guidelines, which can be challenging to standardize and abstract to implement. For example, selecting between gene panels and whole exome/genome sequencing (WES/WGS) for rare disease diagnosis frequently requires interpretation of evidence-based recommendations from the American College of Medical Genetics and Genomics (ACMG) guideline. Traditional machine learning (ML) models predicting suitable genetic tests often f...
Show abstract
Ambient AI documentation tools generate draft notes that clinicians can review and edit before signing off in electronic health records. Scalable computational approaches to characterize how clinicians modify drafts remain limited, yet are essential for evaluating and improving AI effectiveness. We examined the feasibility of a few-shot prompted large language model (LLM) for categorizing sentence-level edits between AI drafts and final documentation. We developed five label-specific binary mode...
Show abstract
We present ARCADE (Adversarial Critique Architecture for Document Evaluation), a multi-agent architecture addressing three limitations of traditional retrieval-augmented generation for automated document analysis: incomplete information extraction, shallow analytical depth, and framework paraphrasing. We compared ARCADE against Single-Pass RAG using 95 policy documents (50 National Cancer Control Plans and 45 Cardiovascular Disease plans) evaluated on 36 metrics across six capabilities: Natural ...
Show abstract
Large language models (LLMs) are increasingly explored as tools for healthcare research and data analysis. However, their applicability to structured public health datasets, especially in non-English contexts, remains underexamined. We systematically evaluated 11 state-of-the-art LLMs on their ability to generate executable Python code for analytical queries over Czech public health datasets, focusing on incidence and prevalence data provided by the National Health Information Portal (known as N...
Show abstract
Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models...
Show abstract
Achieving timely diagnosis for rare diseases remains challenging due to, among others, phenotypic heterogeneity and incomplete clinical data. While the Solve-RD project developed a phenotype-based gene prioritisation method, this approach did not account for the clinical consistency among related diseases in Orphanets hierarchical classifications. We present a phenotype-based computational pipeline that ranks candidate ORPHAcodes based on patient phenotypes. The pipeline computes patient-diseas...